{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "R6gHiH-I7uFa"
   },
   "source": [
    "# Ungraded Lab: Improving Computer Vision Accuracy using Convolutions\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Z6396DKnr-xp"
   },
   "source": [
    "# Shallow Neural Network\n",
    "\n",
    "In the previous lessons, you saw how to do fashion recognition using a neural network containing three layers -- the input layer (in the shape of the data), the output layer (in the shape of the desired output) and only one hidden layer. You experimented with the impact of different sizes of hidden layer and number of training epochs on the final accuracy. For convenience, here's the entire code again. Run it and take a note of the test accuracy that is printed out at the end. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qnCNAG-VecJ9"
   },
   "outputs": [],
   "source": [
    "# First import all the libraries you will need\n",
    "import tensorflow as tf\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Load the Fashion MNIST dataset\n",
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "# Normalize the pixel values\n",
    "training_images = training_images / 255.0\n",
    "test_images = test_images / 255.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_Note: You can safely ignore this warning message (here and in other notebooks) when you start training: `WARNING: All log messages before absl::InitializeLog() is called are written to STDERR`_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "xcsRtq9OLorS"
   },
   "outputs": [],
   "source": [
    "# Define the model\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28,1)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(128, activation=tf.nn.relu),\n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])\n",
    "\n",
    "# Setup training parameters\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "# Train the model\n",
    "print(\"\\nMODEL TRAINING:\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "# Evaluate on the test set\n",
    "print(\"\\nMODEL EVALUATION:\")\n",
    "test_loss, test_accuracy = model.evaluate(test_images, test_labels, verbose=0)\n",
    "print(f'test set accuracy: {test_accuracy}')\n",
    "print(f'test set loss: {test_loss}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zldEXSsF8Noz"
   },
   "source": [
    "## Convolutional Neural Network\n",
    "\n",
    "In the model above, your accuracy will probably be about 89% on training and 87% on validation. Not bad. But how do you make that even better? One way is to use something called _convolutions_. The main concept of convolutions is that they narrow down the content of the image to focus on specific parts (for more detailed explanation please see the resources in the classroom). This will likely improve the model accuracy. \n",
    "\n",
    "If you've ever done image processing using a filter (like [this](https://en.wikipedia.org/wiki/Kernel_(image_processing))), then convolutions will look very familiar. In short, you take an array (usually 3x3 or 5x5) and scan it over the entire image. By changing the underlying pixels based on the formula within that matrix, you can do things like edge detection. So, for example, if you look at the above link, you'll see a 3x3 matrix that is defined for edge detection where the middle cell is 8, and all of its neighbors are -1. In this case, for each pixel, you would multiply its value by 8, then subtract the value of each neighbor. Do this for every pixel, and you'll end up with a new image that has the edges enhanced.\n",
    "\n",
    "This is perfect for computer vision because it often highlights features that distinguish one item from another. Moreover, the amount of information needed is then much less because you'll just train on the highlighted features.\n",
    "\n",
    "That's the concept of **Convolutional Neural Networks**. Add some layers to do convolution before you have the dense layers, and then the information going to the dense layers is more focused and possibly more accurate.\n",
    "\n",
    "Run the code below. This is the same neural network as earlier, but this time with [Convolution](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) and [MaxPooling](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D) layers added first. It will take longer, but look at the impact on the accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "C0tFgT1MMKi6"
   },
   "outputs": [],
   "source": [
    "# Define the model\n",
    "model = tf.keras.models.Sequential([\n",
    "                                                         \n",
    "    # Add convolutions and max pooling\n",
    "    tf.keras.Input(shape=(28,28,1)),\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2, 2),\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "\n",
    "    # Add the same layers as before\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(128, activation='relu'),\n",
    "    tf.keras.layers.Dense(10, activation='softmax')\n",
    "])\n",
    "\n",
    "# Print the model summary\n",
    "model.summary()\n",
    "\n",
    "# Use same settings\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "# Train the model\n",
    "print(\"\\nMODEL TRAINING:\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "# Evaluate on the test set\n",
    "print(\"\\nMODEL EVALUATION:\")\n",
    "test_loss, test_accuracy = model.evaluate(test_images, test_labels, verbose=0)\n",
    "print(f'test set accuracy: {test_accuracy}')\n",
    "print(f'test set loss: {test_loss}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uRLfZ0jt-fQI"
   },
   "source": [
    "It's likely gone up to about 92% on the training data and 90% on the validation data. That's significant, and a step in the right direction!\n",
    "\n",
    "Look at the code again, and see, step by step how the convolutions were built. Instead of the input layer at the top, you added a [Conv2D layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D). The parameters are:\n",
    "\n",
    "1. The number of convolutions you want to generate. The value here is purely arbitrary but it's good to use powers of 2 starting from 32.\n",
    "2. The size of the Convolution. In this case, a 3x3 grid.\n",
    "3. The activation function to use. In this case, you used a ReLU, which you might recall is the equivalent of returning `x` when `x>0`, else return `0`.\n",
    "4. In the first layer, the shape of the input data.\n",
    "\n",
    "You'll follow the convolution with a [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D) layer which is designed to compress the image, while maintaining the content of the features that were highlighted by the convolution. By specifying `(2,2)` for the MaxPooling, the effect is to quarter the size of the image. Without going into too much detail here, the idea is that it creates a 2x2 array of pixels, and picks the biggest one. Thus, it turns 4 pixels into 1. It repeats this across the image, and in doing so, it halves both the number of horizontal and vertical pixels, effectively reducing the image to 25% of the original image.\n",
    "\n",
    "You can call `model.summary()` to see the size and shape of the network, and you'll notice that after every max pooling layer, the image size is reduced in this way. \n",
    "\n",
    "\n",
    "```\n",
    "model = tf.keras.models.Sequential([\n",
    "  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),\n",
    "  tf.keras.layers.MaxPooling2D(2, 2),\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RMorM6daADjA"
   },
   "source": [
    "Then you added another convolution and flattened the output.\n",
    "\n",
    "\n",
    "\n",
    "```\n",
    "  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "  tf.keras.layers.MaxPooling2D(2,2)\n",
    "  tf.keras.layers.Flatten(),\n",
    "  \n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qPtqR23uASjX"
   },
   "source": [
    "After this, you'll just have the same DNN structure as the non convolutional version. The same dense layer with 128 neurons, and output layer with 10 neurons as in the pre-convolution example:\n",
    "\n",
    "\n",
    "\n",
    "```\n",
    "  tf.keras.layers.Dense(128, activation='relu'),\n",
    "  tf.keras.layers.Dense(10, activation='softmax')\n",
    "])\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Np6AjBlLYveu"
   },
   "source": [
    "## About overfitting\n",
    "\n",
    "Try running the training for more epochs -- say about 20, and explore the results. But while the results might seem really good, the validation results may actually go down, due to something called _overfitting_. In a nutshell, overfitting occurs when the network learns the data from the training set really well, but it's too specialised to only that data, and as a result is less effective at interpreting other unseen data. For example, if all your life you only saw red shoes, then when you see a red shoe you would be very good at identifying it. But blue suede shoes might confuse you... and you know you should never mess with my blue suede shoes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IXx_LX3SAlFs"
   },
   "source": [
    "# Visualizing the Convolutions and Pooling\n",
    "\n",
    "Next you can explore how to show the convolutions graphically. The cell below prints the first 100 labels in the test set, and you can see that the ones at index `0`, index `23` and index `28` are all the same value (i.e. `9`). They're all shoes. Take a look at the result of running the convolution on each, and you'll begin to see common features between them emerge. Now, when the dense layer is training on the output of the convolution layers, it's working with a lot less, and it's perhaps finding a commonality between shoes based on this convolution/pooling combination.\n",
    "\n",
    "_Note: You might get different results here compared to the screencast because of randomness in the model training. Nonetheless, it's a good exercise on how you can visualize the results of your current training run._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "f-6nX4QsOku6"
   },
   "outputs": [],
   "source": [
    "print(f\"First 100 labels:\\n\\n{test_labels[:100]}\")\n",
    "\n",
    "print(f\"\\nShoes: {[i for i in range(100) if test_labels[:100][i]==9]}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9FGsHhv6JvDx"
   },
   "outputs": [],
   "source": [
    "FIRST_IMAGE=0\n",
    "SECOND_IMAGE=23\n",
    "THIRD_IMAGE=28\n",
    "CONVOLUTION_NUMBER = 1\n",
    "layers_to_visualize = [tf.keras.layers.Conv2D, tf.keras.layers.MaxPooling2D]\n",
    "\n",
    "layer_outputs = [layer.output for layer in model.layers if type(layer) in layers_to_visualize]\n",
    "activation_model = tf.keras.models.Model(inputs = model.inputs, outputs=layer_outputs)\n",
    "\n",
    "f, axarr = plt.subplots(3,len(layer_outputs))\n",
    "\n",
    "for x in range(len(layer_outputs)):\n",
    "    f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1), verbose=False)[x]\n",
    "    axarr[0,x].imshow(f1[0, :, :, CONVOLUTION_NUMBER], cmap='inferno')\n",
    "    axarr[0,x].grid(False)\n",
    "  \n",
    "    f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1), verbose=False)[x]\n",
    "    axarr[1,x].imshow(f2[0, :, :, CONVOLUTION_NUMBER], cmap='inferno')\n",
    "    axarr[1,x].grid(False)\n",
    "  \n",
    "    f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1), verbose=False)[x]\n",
    "    axarr[2,x].imshow(f3[0, :, :, CONVOLUTION_NUMBER], cmap='inferno')\n",
    "    axarr[2,x].grid(False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8KVPZqgHo5Ux"
   },
   "source": [
    "### EXERCISES\n",
    "\n",
    "1. Try editing the convolutions. Change the 64s to either 16 or 32. What impact will this have on accuracy and/or training time?\n",
    "\n",
    "2. Remove the final Convolution. What impact will this have on accuracy or training time?\n",
    "\n",
    "3. How about adding more Convolutions? What impact do you think this will have? Experiment with it.\n",
    "\n",
    "4. Remove all Convolutions but the first. What impact do you think this will have? Experiment with it. \n",
    "\n",
    "5. In the 2nd ungraded lab last week, you implemented a callback to check on the loss function and to cancel training once it hit a certain amount. See if you can implement that here."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "C1_W3_Lab_1_improving_accuracy_using_convolutions.ipynb",
   "private_outputs": true,
   "provenance": [
    {
     "file_id": "https://github.com/https-deeplearning-ai/tensorflow-1-public/blob/25_august_2021_fixes/C1/W3/ungraded_labs/C1_W3_Lab_1_improving_accuracy_using_convolutions.ipynb",
     "timestamp": 1638957936408
    }
   ],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
